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ABSTRACT 

Bacterial RNase J and eukaryal cleavage and poly- 
adenylation specificity factor (CPSF-73) are members 
of the p-CASP family of ribonucleases involved in 
mRNA processing and degradation. Here we report 
an in-depth phylogenomic analysis that delineates 
aRNase J and archaeal CPSF (aCPSF) as distinct 
orthologous groups and establishes their repartition 
in 110 archaeal genomes. The aCPSFI subgroup, 
which has been inherited vertically and is strictly 
conserved, is characterized by an N-terminal exten- 
sion with two K homology (KH) domains and a 
C-terminal motif involved in dimerization of the holo- 
enzyme. Pab-aCPSF1 (Pyrococcus abyssi homolog) 
has an endoribonucleolytic activity that preferentially 
cleaves at single-stranded CA dinucleotides and 
a 5-3 exoribonucleolytic activity that acts on S 
monophosphate substrates. These activities are the 
same as described for the eukaryotic cleavage and 
polyadenylation factor, CPSF-73, when engaged in 
the CPSF complex. The N-terminal KH domains are 
important for endoribonucleolytic cleavage at certain 
specific sites and the formation of stable high 
molecular weight ribonucleoprotein complexes. 
Dimerization of Pab-aCPSF is important for exori- 
bonucleolytic activity and RNA binding. Altogether, 
our results suggest that aCPSFI performs an essen- 
tial function and that an enzyme with similar activities 
was present in the last common ancestor of Archaea 
and Eukarya. 

INTRODUCTION 

RNA processing and degradation are critical to the 
survival of all cells and acknowledged as a means of 
regulating gene expression. In particular, the nature of 



RNA 5' and Spends is known to have major impact 
because they control the entry and directionality of 
endo- and exoribonucleases involved in these processes. 
In Archaea, exploration of RNA processing and degrad- 
ation pathways is still in its early stages. Because easy 
genetic approaches are not readily available, functional 
studies in the Archaea are often based on genomic and 
proteomic analyses that are interpreted in light of our 
understanding of RNA metabolism in Bacteria and 
Eukarya. Unlike its eukaryal counterpart, archaeal 
mRNA is not capped at its 5'-end nor is it polyadenylated 
at its 3'-end [(1) for review]. Transcription in the Archaea 
is performed by a eukaryal-like RNA polymerase that ini- 
tiates at 'TATA-boxes' [for review see (2)]. However, little 
is known about mRNA 3'-end maturation and transcrip- 
tion termination. 

Processing of the 3'-end is an essential step in converting 
eukaryotic pre-mRNAs to mature polyadenylated 
mRNAs [for review see (3)]. This process is executed by 
the cleavage and polyadenylation macromolecular com- 
plex, which is well-described in both yeast and mammals 
(4-9). Among other proteins, this machinery includes the 
cleavage and polyadenylation specificity factor (CPSF-73), 
a 73kDa subunit that carries out the endonucleolytic 
cleavage at a CA motif 20-30 nt downstream of the AA 
UAAA consensus sequence before polyadenylation 
(10,11). In the maturation of metazoan histone 
pre-mRNA, CPSF-73 has also been shown to act as a 
5'-3' exoribonuclease in the degradation of the transcript 
downstream of the cleavage site (12). Based on sequence 
similarity, a gene encoding a homolog of the eukaryotic 
CPSF-73 has been reported to be prevalent in archaeal 
genomes (13) raising the question of its role in archaeal 
RNA metabolism. Recently, CPSF-73 homologs in 
Methanocaldococcus jannashii and Methanothermobacter 
thermautotrophicus have been shown to have nuclease 
activity (14-16). However, the enzymatic properties and 
specificities of archaeal CPSF(aCPSF)-73 homologs 
remain to be clearly delineated. 
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The eukaryotic CPSF-73 is a member of the (3-CASP 
family of metallo-(3-lactamases (17), which includes ribo- 
nucleases important in RNA metabolism that are wide- 
spread in all three domains of life (18). The (3-CASP 
proteins have the first four signature motifs of the 
metallo-(3-lactamase superfamily followed by a distinct 
region [[3-CASP domain, (19)] that is characterized by 
three short conserved motifs A (Asp or Glu), B (His) 
and C (His). These enzymes use a zinc-dependent mech- 
anism in catalysis and act as 5'-3' exonucleases and/or 
endonucleases (19,20). Examples of (3-CASP nucleases 
include RNase Jl, a key player of Bacillus subtilis RNA 
metabolism that functions as an endonuclease and a 5'-3' 
exonuclease (21-23). We have shown recently that 
orthologs of bacterial RNase J with 5 -3' exonuclease 
activity are widespread in the Euryarchaea (24). Similar 
activity has been described for a closely related (3-CASP 
protein of the crenarchaeon Sulfolobus solfataricus (25). 
Moreover, several crystal structures of aCPSF-73 
homologs in M. thermoautotrophicus, Pyrococcus 
horikoshii and Methanosarcina mazei have been solved. 
These proteins are dimeric (14,26) and have a tripartite 
architecture consisting of a N-terminal region with two 
K homology (KH) RNA-binding motifs, a central (3- 
metallo-lactamase domain and a C-termimal (3-CASP 
domain (14,16,26). Altogether, identification of 
homologs of the eukaryotic CPSF-73 and of bacterial 
RNase J in the Archaea raises the question of their role 
in RNA metabolism and their evolutionary origin. 

The metallo (3-lactamase protein superfamily is highly 
represented in Archaea (26). Among them, the archaeal 
(3-CASP family members were proposed to be RNA 
hydrolases as their sequences are more closely related to 
bacterial RNase J and to eukaryotic CPSF-73 than to 
(3-CASP proteins involved in DNA repair and recombin- 
ation (19). Here we report the inventory, classification and 
phylogenetic analysis of the archaeal (3-CASP proteins, 
allowing us to identify seven P-CASP clusters. Among 
them, one is clearly related to bacterial RNase J 
[archaeal RNase (aRNase J)] and three are related to 
CPSF-73. Members of the aCPSFl group are present in 
all Archaea whose genomes have been sequenced. We 
show that Pab-aCPSFl from the thermococcal archaeon 
Pyrococcus abyssi has the same activities and specificity as 
its eukaryotic counterpart, CPSF-73. Thus, aCPSFl 
family members are authentic orthologs of CPSF-73. 

MATERIALS AND METHODS 

Collection of archaeal P-CASP proteins 

Genome entries of the complete archaeal and bacterial 
genomes were retrieved from EMBL (http://www.ebi.ac. 
uk/genomes/) and processed by a set of perl programs into 
a mySQL database. We used the RPS-Blast program to 
annotate protein sequences according to the conserved 
domain database available at the NCBI (http://www. 
ncbi.nih.gov/Structure/cdd/cdd.shtml) and computed 
pairs of one-to-one ortholog genes with BlastP as 
follows: two genes a and b from genomes A and B, are 
considered to be orthologs if a is the best hit of b in 



genome A and reciprocally, and if a (or b) has a paralo- 
gous gene named c then the score of a versus b should be 
greater than the score of a (or b) versus c. The proteins of 
the (3-CASP family have the first four signature motifs of 
the metallo- (3-lactamase superfamily followed by a distinct 
globular domain [named (3-CASP domain, (19)] that is 
used to identify new members of the family. Callebaut 
et al. (19) identified a list of (3-CASP family members in 
eukaryotes, bacteria and archaea. The archaeal list 
included only 14 species and 40 (3-CASP candidate 
proteins. To update the annotation of the (3-CASP 
proteins in the 110 complete archaeal genomes, we used 
the (3-CASP domain of each candidate sequence as query 
in a Psi-Blast search against the protein sequences of 
complete archaeal genomes. To maximize the sensibility 
of the prediction, we set the maximal lvalue <le-05 and 
the maximal number of iterations to 20 to be able to 
recover all putative candidates in each individual 
Psi-Blast search. This resulted in an initial collection of 
375 proteins. 

Protein classification 

For each protein of the initial collection, we retrieved 
orthologs in each archaeal genome to obtain orthologous 
proteins pairs. Among these, 13 proteins were not 
identified by the PsiBlast search. A graph was produced 
where vertices correspond to proteins and edges to 
orthologous relationships. This graph included six con- 
nected components. The application of a partition algo- 
rithm [MCL with an inflation operator setup to 1.2 (27)] 
revealed nine well-defined groups of proteins (orthologous 
groups, OGs). The protein members of two groups (GloB, 
COG0491, Zn-dependent hydrolases including glyoxylases 
and MtrA, COG4063, tetrahydromethanopterin 
S-methyltransferase, subunit A) do not have the A, B 
and C (3-CASP signature motifs. Hence, they were false 
positives and were discarded from further analysis. The 
other OG proteins possess the signature of the P-CASP 
family and fall in three related COGs: COG1782 (pre- 
dicted metal-dependent RNase, consisting of a 
metallo-beta-lactamase domain and an RNA-binding 
KH domain), COG1236 (predicted exonuclease with 
beta-lactamase fold involved in RNA processing) and 
COG0595 (predicted hydrolase of the metallo-beta- 
lactamase superfamily). As the different protein groups 
were not found in all archaeal genomes, we systematically 
searched for putatively unannotated genes (missed by an- 
notation or pseudogenes) with the TblastN program. The 
strategy used to compute the clusters of orthologous 
proteins was validated by the very low level of paralogy 
observed in each group (less than three paralogs per 
genome). 

Alignments and trees computation 

The archaeal phylogeny was inferred from a set of 53 
ribosomal proteins using the super matrix approach (28) 
with only one strain per species. Sequences were aligned 
using the MUSCLE program (29). The alignments were 
inspected and manually refined using the SEAVIEW 
sequence editor (30) and trimmed using trimAl (31). 
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These parsed alignments were concatenated to produce a 
single alignment of 6938 residues. The maximum likeli- 
hood trees were computed with PhyML (32) using the 
LG model of sequence evolution. The gamma-distributed 
substitution rate variation was approximated by four 
discrete categories with shape parameter and proportion 
of invariant sites estimated from the data. Non-parametric 
bootstrap values were computed (100 replications of the 
original dataset) using the same parameters. Trees were 
visualized and annotated with TreeDyn (33). 

Construction of vectors for the expression of Pab-CPSFl 
and variants 

The supplementary Table SI summarizes oligonucleotides 
used in this study. The plasmid used for expression of 
HIS-tagged Pab-CPSFl was pET15b. The coding 
sequence (PAB1868) was amplified by PCR from P. 
abyssi genomic DNA and cloned as an XhoI-BamHI 
PCR fragment using OLC5 and OLC3 oligonucleotides 
to give the plasmid pEC-Pab-CPSFl. The AKH and 
ACter variants were constructed using OLAKHC5/ 
OLC3 and OLC5/OLAC3 oligonucleotide pairs. The 
H261A and H594A variants were generated by site- 
directed mutagenesis of pEC-Pab-CPSFl with the appro- 
priate oligonucleotides (OLIA261/OL2A261; OL1A594/ 
OL2A594, respectively) and the QuikChange II XL Kit 
(Stratagene). 

Overexpression and purification of wild type Pab-aCPSFl 
and variants 

The BL21-CodonPlus (DE3) Escherichia coli strain 
carrying pEC-Pab-aCPSFl or plasmids bearing mutations 
was induced at an OD 600 of 0.6 by addition of 0.1 mM 
IPTG, and incubated 3h at 30°C. A cell extract was 
heated to 70° C for 10 min and clarified by low-speed cen- 
trifugation. TALON Metal resin (Clontech) was used for 
IMAC purification of the HIS-tagged protein. Proteins 
(~10ug/ul) were dialysed against 20 mM Hepes pH 7.5, 
300 mM NaCl, 1 mM EDTA, 1 mM DTT and 1% glycerol 
and stored at 4°C. The His-tag was removed by treatment 
with thrombin (0.5u/ul) for 2h at room temperature, 
before incubation at 70°C for 15 min and centrifugation 
to recover the supernatant. An aliquot of the purified 
protein was analysed by Coomassie stained 10% 
SDS-PAGE and migrated at 72kDa in agreement with 
the predicted molecular mass of 72 kDa for the wild type 
protein and at 55 kDa for the AKH variant. 

RNA synthesis and labeling 

In vitro transcription with T7 RNA polymerase was 
performed as described by the manufacturer (Promega) 
using PCR fragments as templates. The sR47 and sRkB 
templates were prepared as described previously (34,35) 
and the sR47MutCG and sR47Mut21U using oligo- 
nucleotides OLsR47MutCG5/OLsR47MutCG3 and 
OLsR47Mut21U5/OLsR47Mut21U3, respectively (Sup- 
plementary Table SI). [oc- 32 P] UTP and [y- 32 P] GTP 
were added to the in vitro transcription mix to synthesize 
uniformly labeled transcripts and 5'-end tripho- 
sphorylated labeled RNAs (p*pp RNA), respectively. 



5'-end monophosphate RNA labeling was performed on 
dephosphorylated RNA or synthetic RNA with T4 poly- 
nucleotide kinase in the presence of [y— 32 P] ATP. 3 / -end 
labeling was carried out with T4 RNA ligase in presence 
of [5'- & P] pCp and DMSO. All labeled RNAs were 
purified on denaturing 8 or 10% PAGE. 

RNase assay 

A typical enzyme excess reaction in a final volume of 1 5 ul 
contained 5 nM 32 P-RNA, 6 uM wild type Pab-aCPSFl or 
variants, 20 mM Hepes, pH 7.5, 100 mM KC1, 1.5 mM 
MgCl. Reactions were started by addition of the enzyme 
and incubated at 65°C and repeated in at least three inde- 
pendent experiments. Samples of 4 jil were withdrawn at 
the indicated times and the reactions were stopped by in- 
cubation with proteinase K (20 u) for 10 min at 37°C 
before addition of formamide-containing dye supple- 
mented with lOmM EDTA or spotted directly on thin 
layer chromatography (TLC) plates (PEI-cellulose, 
Nagel). The samples and Tl/OH ladders were denatured 
for lmin at 95°C before separation on 10% PAGE/8 M 
urea sequencing gels. TLC plates were developed with 
0.25 M KH 2 P0 4 and gels were dried before analysis 
using Phospholmager and MultiGauge software. 

Electrophoretic mobility shift assay 

EMS A was performed as previously described (34). RNA 
and ribonucleoprotein (RNP) complexes were separated 
on a native 5% (19:1) polyacrylamide gel containing 
0.5x TBE and 5% glycerol. Electrophoresis was per- 
formed at room temperature at 250V in 0.5x TBE 
running buffer containing 5% glycerol. The gels were 
dried and visualized using a Fuji-Bas 1000 
phosphorlmager. 

Size exclusion chromatography 

After IMAC purification, Pab-aCPSFl and its variants 
were concentrated by ultrafiltration (Millipore Anicon 
Ultra 30 K) and loaded onto a Superdex 200 10/300GL 
gel filtration column (GE Healthcare), pre-equilibrated in 
20 mM HEPES (pH 7.5), 300 mM NaCl, 1 mM DTT, 
1 mM EDTA and 1% glycerol. The protein standard kit 
(GE-healthcare) containing ferritin (440 kDa), aldolase 
(158 kDa), conalbumin (75 kDa), ovalbumin (43 kDa) 
and ribonuclease A (13.7 kDa) was used to estimate the 
molecular mass of the native protein. The flow rate was 
fixed at 0.5ml/min and elution of the protein was moni- 
tored by absorbance at 280 nm. 

RESULTS 

Groups of orthologous P-CASP proteins in the Archaea. 

We undertook a detailed analysis of all archaeal P-CASP 
members to define OGs and to elucidate their evolutionary 
relationships. We collected P-CASP sequences from 110 
complete archaeal genomes and classified them based on 
sequence conservation as described in Methods. Of the 
nine clusters that emerged from this analysis (Supple- 
mentary Figure SI A), members of the GloB and MtrA 
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clusters did not have the conserved A, B and C motifs 
characteristic of (3-CASP proteins; members of the a(3Cx 
cluster had non-canonical spacing between the A and B 
sequence motifs (Supplementary Figure SIB); members of 
the ocpCy and ocPCz clusters were not monophyletic, sug- 
gesting complex evolution that might include horizontal 
gene transfers with bacteria. These OGs were not further 
considered here. Using the remaining four groups of 
orthologous (3-CASP proteins, we constructed a tree that 
was rooted with eukaryotic CPSF-73 and bacterial RNase 
J (Figure 1A). These OGs, which we named aRNase J, 
aCPSFl, aCPSFlb and aCPSF2, correspond to distinct 
and specific subtrees. This configuration validates our 
classification procedure. Apart from the members of the 
major aCPSFl group harboring an N-terminal extension 
of 110 amino acids, archaeal (3-CASP candidates are 
commonly restricted to the (3-CASP and metallo-(3- 
lactamase core domains (composed of an average of 420 
amino acids) with no additional N- or C-terminal exten- 
sions (see below). Note that the S. solfataricus aCPSF2 
member (Sso 0386) with a 40 amino acids N-terminal 
region, is peculiar among the aCPSF2 group. 

Bacterial and aRNase J were separated from the 
CPSF-like OGs by a long branch (100% bootstrap 
support), suggesting an early evolutionary separation. 
Bacterial RNase J is distinguished from the archaeal 
homologs by a characteristic C-terminal extension 
(Figure IB). The 65 aRNase J members were exclusively 
present in Euryarchaea as described previously (24) 
(Supplementary Figure S2). Recently, three members of 
this group have been reported to have 5'-3' 
exoribonuclease activity (15,24) (Table 1) and a phylogen- 
etic analysis showed that aRNase J has been inherited 
vertically, suggesting an ancient origin predating the sep- 
aration of Bacteria and Archaea (24). 

Proteins related to eukaryal CPSF-73 clustered into three 
OGs: aCPSFl (112 members) and aCPSFlb (11 members) 
corresponding to COG 1782 and aCPSF2 (80 members) cor- 
responding to COG1236. The aCPSF2 OG is distributed 
among Crenarchaeota, Euryarchaeota and 
Thaumarcheoata (Supplementary Figure S2). One 
member of this subgroup, which was misidentified as 
RNase J orthologs, has been reported to have a 5 / -3 / 
exonucleolytic activity (25) (Figure 1A and Table 1). The 
aCPSFl OG corresponds to a highly conserved family with 
an N-terminal extension containing two KH RNA binding 
motifs specific to this group, and a C-terminal motif that is 
part of a protein dimer interface (Figures IB and 2). This 
OG is notable because of its remarkable conservation in all 
Archaea with no exception to date (Supplementary Figure 
S2). Moreover, the congruence between the archaeal and 
aCPSFl phylogenetic trees (Figure 3) shows that aCPSFl 
has been inherited vertically, suggesting an ancient origin 
predating the emergence of Archaea. The small aCPSFlb 
OG branching close to the aCPSFl OG is restricted to the 
Methanococcales (Figure 1A and Supplementary 
Figure S2). Only one member (MJ0162, misidentified as 
RNase J orthologs) is biochemically characterized and 
harbors 5'-3' exonucleolytic activity (25) (Figure 1A and 
Table 1). The aCPSFlb proteins, which appear to have an 



undecipherable ancient origin, lack the N-terminal exten- 
sion that is characteristic of the aCPSFl family. 

The crystal structure of three aCPSFl members has 
recently been reported (14,16,26) as well as endoribo- 
nuclease activity for one of them (15) (Figure 1A and 
Table 1). However, little is known about the substrate 
specificity and enzymatic properties of the aCPSFl 
members. To better characterize the ubiquitous aCPSFl 
OG, we investigated the properties of the P. abyssi 
member of this group. The P. abyssi genome contains 
three open reading frames with (3-CASP protein signa- 
tures. PAB1751, PAB1868 and PAB1035 are members of 
the aRNase J, the aCPSFl and the highly divergent ocpCy 
cluster, respectively (Figure 1A and Supplementary Figure 
SI A). PAB1035 was not further considered here. 
PAB1751, denoted as Pab-aRNase J, corresponds to the 
recently identified ortholog of the bacterial RNase J 
(Figure IB) (24,36). This protein has been shown to 
have a highly processive 5'-end-dependent exonuclease 
activity with a 5 -3' directionality (24). In the following 
sections, we analysed the mode of ribonuclease cleavage 
and substrate specificity of Pab-aCPSFl as well as the 
function of the N-terminal KH domains and C-terminal 
protein dimer interface. 

Pab-aCPSFl has endo- and exoribonuclease activity 

We investigated the enzymatic activity of recombinant 
Pab-aCPSFl (untagged version) by performing assays in 
enzyme excess using the well-characterized 64-nt sR47 
RNA substrate, corresponding to a P. abyssi box C/D 
guide RNA (24,34) (Figure 4). Incubation of 
Pab-aCPSFl with 5'-end-labeled triphosphorylated RNA 
(5'p*pp RNA) yielded two major products of 21 and 59 nt 
in length (Figure 4A). These products correspond to cleav- 
ages after cytosines C21 and C59, which are located in the 
only two CA dinucleotides in the sR47 RNA substrate 
(Figure 4A). Assays with triphosphorylated 3 / -end-labeled 
(5'ppp RNAp*Cp) and uniformly labeled (5'ppp 
RNA(U)*) substrates yielded two and four RNA 
products, respectively, corresponding to cleavages at the 
CA dinucleotides (Figure 4A). The addition of divalent 
ions to the reaction buffer did not stimulate activity, nor 
did the addition of EDTA inhibit the reaction. However, 
the activity of the enzyme was strongly inhibited by 
addition of 1,10-phenanthroline, a potent Zn 2+ chelator 
(Supplementary Figure S3A). Furthermore, the substitu- 
tion of conserved residues in the P-lactamase and P-CASP 
motifs (H261 A and H594A in motifs 2 and B, respectively, 
Figure IB) abolished ribonuclease activity (Supplemen- 
tary Figure S3B). The recently published crystal structures 
of several aCFSFl orthologs show that these histidines are 
involved in coordinating two zinc ions that are essential 
for catalysis (14,16,26). Altogether, these results reveal 
that Pab-aCPSFl is a bona fide P-CASP protein and 
that the activity reported here is not due to a 
contaminating ribonuclease. We performed similar 
assays with sRkB, a 216 nt non-coding RNA recently 
identified in P. abyssi (35) (Figure 5A). sRkB was 
cleaved at nine positions: five corresponded to CA di- 
nucleotides and the other four to GC, CC, AG or AC 
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Figure 1. The archaeal metallo-P-lactamase P-CASP protein family. (A) Phylogenetic tree of the four P-CASP OGs. The P-CASP proteins with 
reported structure or/and activity were added as landmarks: yeast and human CPSF 73 kD (10), M. thermautotrophicus MTH1203 (14), P. horikoshii 
PH1404 (16), M. mazei MM0695 (26), B. subtilis RNase Jl (48), Thermus thermophilus RNase J (21,49), P. abyssi PAB1751 (24), M. jannaschii 
MJ0861, MJ0162 and MJ1236 (15), S. solfataricus SSO0386 (25). The P. abyssi [3-CASP members are framed. It should be mentioned that the 
landmarked Candidatus Korarchaeum cryptofilum P-CASP member is not considered as a CPSF lb fellow in view of its sequence. (B) Schematic 
representation of Pab-aRNAse J (PAB1751) and Pab-aCPSFl (PAB1868), homologs of bacterial RNase J (Top) and eukaryal CPSF-73 (Bottom), 
respectively. The (3-CASP and metallo-P-lactamase domains are highlighted in dark and light grey, respectively. The three P-CASP motifs (A-C), four 
metallo-P-lactamase motifs (1-4) and N- and C-terminus are indicated with respective features. Pab-aRNase J loop 1 and 2 insertions are indicated 
(24). The asterisks (*) indicate the position of conserved amino acids that have been mutated in this study, H261A and H594A. 
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Table 1. Reported archaeal P-CASP proteins with their 5' -3' 
exo- and/or endoribonucleolytic activity 



P-CASP Protein Reference Endo- 5 r -yExo 

OG 



aCPSFl PAB1868 This work + + 

MJ1236* Levy et al. (15) + 

aCPSFlb MJ0162* Levy et al. (15) + 

aCPSF2 Sso0386* Hasenohrl et al. (25) + 

aRNAse J PAB1751 Clouet-d'Orval et al. (24) - + 

TK 1409 Clouet-d'Orval et al. (24) - + 

MJ0861 Levy et al. (15) + 



The proteins marked by asterisks were misidentified as aRNase J 
homologs. 



Mm0695 IETRALTNLETVRLL 

4- 

Oi-c\ico*i-in(oi^coo>oi-c\jco«*io 
N ______ c 

PAB1868 LSTRAPNNLDTIRLR 

623 < ,637 

ACter 

Figure 2. Sequence conservation of the last 16 residues of the 110 
members of aCPSFl is shown using Weblogo representation (http:// 
weblogo.berkeley.edu). The amino acid C-terminal sequence of 
M. mazei Mm0695 and of P. abyssi PAB1868 (Pab-aCPSFl) are 
specified on the top and bottom, respectively. The residues establishing 
interacting hydrogen bonds in the protein-protein interface of the 
dimeric structure of Mm0695 (26) are highlighted in grey and the 
P-sheets formed by these residues are indicated. The last 12 residues 
that were deleted in aCPSFl ACter variant are indicated by a horizontal 
arrow under the P. abyssi sequence. 



dinucleo tides (Figure 5A). Two CA dinucleo tides located 
in the highly stable P4 helix of sRkB were not cleaved 
suggesting a preference for single-stranded CA dinucleo- 
tides. This conclusion is supported by results with the 
sR47MutCG and sR47MutU21, which are not cleaved 
at position 21 (Figure 5B). C21 and A22 are embedded 
in an extended RNA helix in the sR47MutCG variant and 
C21 is replaced by a U in the sR47MutU21 variant 
(Figure 5B, higher panel). Altogether, these results show 
that Pab-aCPSFl has endoribonuclease activity with a 
preference for cleavage at single-stranded CA 
dinucleo tides. 

To test whether the phosphorylation state of the 5'-end 
of the RNA substrate affects Pab-aCPSFl activity, we 
performed assays with 5' monophosphorylated sR47 
(Figure 4B). The degradation of 5'p RNA contrasts 
markedly with that of 5'ppp RNA owing to the produc- 
tion of GMP or UMP (Figure 4B). The 5'p*RNA 
generated radiolabeled GMP, which corresponds to the 
5' teminal base in the sR47 substrate, whereas the uni- 
formly labeled 5'p RNA(U)* generated radiolabeled 
UMP (Figure 4B). Comparable results were observed 
with the sRkB RNA substrate (Figure 5A). The 3'-end- 
labeled 5'p RNA yielded radiolabeled p*Cp, but not the 
3 / -end-labeled 5'ppp RNA. These results strongly suggest 



that Pab-aCPSFl has a 5' monophosphate-dependent 
5-3' exoribonucleolytic activity. This dependence is 
strict because neither 5'ppp (Figure 4A and Figure 5A) 
nor 5'hydroxyl (Figure 4C) transcripts can be degraded 
exonucleolytically. However, it should be noted that the 
distal products of endonucleolytic cleavage of the 5'ppp 
and 5'hydroxyl substrates do not appear to be degraded 
by the exoribonucleolytic activity as evidenced by the 
absence of UMP production. This suggests that products 
of endonucleolytic cleavage are somehow protected from 
exoribonucleolytic digestion. Interestingly, the exoribonu- 
cleolytic activity appears to be slowed or impeded by 
RNA secondary structure (compare the production of 
UMP* with the sR47, sR47MutCG and sR47 Mut21U 
substrates) (Figure 5B, lower panel). Finally, like other 
exoribonucleases from the P-CASP family [see ref. (24)], 
Pab-aCPSFl can partially degrade S'-end-labeled DNA 
oligonucleotides to mononucleotides (Supplementary 
Figure S3C). In conclusion, Pab-aCPSFl has a dual 
activity: an endoribonuclease activity that preferentially 
cleaves at single-stranded CA dinucleotides, and 
exoribonuclase activity that is restricted to 
S'-monophosphorylated RNA substrates. 

Pab-aCPSFl N-terminal and C-terminal extensions are 
involved in RNA binding and protein dimerization 

To investigate further the properties of Pab-aCPSFl, we 
produced a variant deleted for the last 12 residues of 
Pab-aCPSFl (Pab-aCPSFl ACter) (Figure 2). In the 
crystal structures of M. mazei (Mm) and M. 
thermautotrophicus aCPSFl (14,26), these residues form 
a network of hydrogen bonding interactions at the inter- 
face of the dimeric holoenzyme. Furthermore, the inter- 
acting residues correspond to a sequence motif that is 
conserved in the aCPSFl family (Figure 2). Gel filtration 
shows that the ACter variant is mostly monomeric, 
whereas the wild type protein is dimeric (Figure 6A, left 
and middle panels), thus validating the role of the 
C-terminus in dimerization. Note that the Pab- ACter re- 
combinant protein is highly sensitive to proteolysis in the 
region linking the N-terminal KH domain and the core 
P-CASP metallo-(3-lactamase domains (Supplementary 
Figure S4). We assayed the activity of the ACter variant 
using the 5'p*pp RNA, 5'ppp RNA (U)* and 5'p RNA 
(U)* substrates. Our data show that the exonucleolytic 
activity of Pab-aCPSFl ACter is impaired as evidenced 
by the absence of UMP production with the 5'p 
RNA(U)* substrate (Figure 6B, left panel). Although 
endonucleolytic cleavage of 5'ppp RNA (U)* appears to 
be weak, cleavages at C21 and C59 are clearly detected 
with the 5'p*pp RNA and 5'p RNA(U)*. 

To test the importance of the Pab-aCPSFl N- terminus 
containing two KH domains (Figure 1), we produced a 
Pab-aCPSFl AKH variant missing the first 179 residues. 
Gel filtration shows that Pab-aCPSFl AKH is dimeric 
(Figure 6 A, right panel). Thus, the N-terminal extension 
is not involved in dimerization. We assayed the activity of 
Pab-aCPSFl AKH using the sR47 substrate (Figure 6B). 
The pattern of digestion is comparable with wild type 
(Figure 4A) except that the 59 nt RNA corresponding to 
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Figure 3. Congruence of the archaeal and aCPSFl phylogenetic trees. The archaeal tree was constructed from a concatenated sequence of 53 
ribosomal proteins. The archaeal aCPSFl tree was constructed from sequences that were available at the time of analysis. Individual archaeal 
clades are colored to facilitate the comparison between the two trees. 
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Figure 4. 7w vitro activity of Pab-aCPSFl. The products of the reaction 
were analysed on a 10% PAGE and in parallel on TLC (B, Bottom). 
(A) Kinetic analysis of RNA cleavage of S'-end-triphosphate sR47 
(<1 nM) by an excess of Pab-aCPSFl (6uM) at 65°C for the indicated 
times (0, 30 and 90 min). ^-end-labeled sR47 RNA carrying a 5' y-[P 32 ] 
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RNAp*Cp), lanes 7-10; uniformly labeled with a-[P 32 ]UTP (5ppp 
RNA(U)*), lanes 13 to 16. Substrates incubated 90 min without 
protein (C90) are shown in lanes 4, 10 and 16; RNase Tl ladder (Tl) 
in lanes 5 and 11; hydroxyl ladders (OH) in lanes 6 and 12. The 
cleavage products are indicated by arrows and their position is 
indicated in the sR47 schematic drawing. The boundaries of the 
cleavage products are given in brackets. (B) Kinetic analysis of RNA 
cleavage of S'-end-monophosphate sR47 by Pab-aCPSFl (6uM) at 
65°C for the indicated time. 5'-end-labeled sR47 RNA carrying a 5' 
a-[P 32 ] (5'p* RNA) is shown in lanes 1-4; uniformly-labeled sR47 (5'p 
RNA(U)*) in lanes 6-8 and (C90) and (OH) in lanes 4 and 5, respect- 
ively. (C) Kinetic analysis of RNA cleavage of 5'-end hydroxyl uni- 
formly labeled sR47 (a-[P 32 ] UTP, 5'OH RNA(U)*) by Pab-aCPSFl 
(7 jiM) at 65°C at the indicated times (lanes 1-3). 



a cleavage at C59 was not produced. We conclude that the 
N-terminal KH domains are not necessary for catalytic 
activity, but are most likely involved in the recognition 
of certain specific sites. Because KH domains are 



predicted to bind nucleic acids (14,16), we examined the 
capacity of the AKH variant to bind sR47 RNA by elec- 
trophoretic mobility shift assay (EMSA) (Figure 6C). On 
incubation with increasing protein concentration, three 
major distinct RNP complexes were detected with wild 
type Pab-aCPSFl. sR47 was fully shifted at a protein 
concentration of about 1 uM. S'-triphosphate and 
5 / -monophosphate RNA bound with similar affinity, sug- 
gesting that the nature of the 5'-end is not important for 
binding. As well, we analysed the RNP patterns of 
sR47MutCG and sR47Mut21U substrates, which are 
invalid for endonucleolytic cleavage at position 21. 
Preliminary data showed similar overall binding affinities 
and high molecular weight RNP complexes as observed in 
Figure 6C (data not shown). However, the intensity of 
each RNP band was somehow different from sR47 
EMSA, which did not permit a clear conclusion. This 
opens the question of whether binding and endonu- 
cleolytic activity could be uncoupled in future studies. 
The affinity for the AKH variant is slightly lower and 
the higher molecular weight RNP complexes are less 
stable as evidenced by smearing in the gel (Figure 6C). 
We also analysed the ACter variant by EMSA. This 
variant is severely impaired in its capacity to bind RNA, 
suggesting that dimerization of the holoenzyme is import- 
ant for RNA binding (Figure 6C). Altogether, these 
results show that the dimerization of Pab-aCPSFl is im- 
portant for exoribonuclease and RNA binding activity, 
whereas the KH domains participate in endoribonu- 
cleolytic cleavage at certain sites and are important for 
the stability of high molecular weight RNP complexes. 



DISCUSSION 

In this study, we systematically identified (3-CASP proteins 
in Archaea and classified them according to sequence 
similarities to determine their phylogenetic relationships 
(Figure 1 and Supplementary Figure SI) and their 
taxonomic distribution (Supplementary Figure S2 and 
Figure 7). Among the seven archaeal [3-CASP OGs that 
we identified: one is related to the bacterial RNase J 
(aRNase J), three are related to the eukaryal CPSF-73 
(aCPSFl, aCPSFlb and aCPSF2). aRNase J, which is 
distributed exclusively in the Euryarchaeota, includes 
three members known to have 5'-end-dependent exonu- 
cleolytic activity (15,24,25) (Table 1). The aCPSF-like 
proteins are clearly divided into three clusters: aCPSFl 
includes extremely well conserved members in all 
Archaea whose genome has been sequenced (this work); 
aCPSF2 groups more divergent members that are wide- 
spread in the Archaea and includes some that were previ- 
ously misidentified as RNase J orthologs (15,25); 
aCPSFlb members are only present in the Methano- 
coccales. aCPSFlb and aCPSFl are closely related, but 
aCPSFlb lacks the N-terminal extension containing two 
KH domains. In summary, our phylogenetic analysis 
rectifies the misidentification of certain archaeal (3-CASP 
proteins as aRNase J homologs (15,25) and clarifies their 
evolutionary origin. 



Nucleic Acids Research, 2013, Vol. 41, No. 2 1099 



P5 



B 



A 

5 .. -C- 

170 U - 



■ U G 

I ' - p 



PI 



sRkB RNA 



P3 

60 C C 



gCA:,, 



u a gA c- 



-U-G G G-CAAAGGCUAAUGA-UGAGG uCAAC-G A A A C G G AG A A G G A G z 



5'p*pp RNA 
Tl OH C60 0 30 60 OH time (min) 



3 9 
5 'ppp RNAp*Cp 5 'p RNA p*Cp 





































^3 








































































^-pCp 



sR47 



A A 



10G*A 
A *G 
U'U 
G-C 



sR47MutCG sR47MutU21 



tec""' 



C -G G 
C — G A 40 
U • U 



G A G 



10G* A 
A* G 
U* U 
G-C 
A G 
G-U 
C-G 
. . .>U-A 
C-G 
G-U 

. .>£-c 
2i*C-G 
A-u 

G-C 
G-U 
U-A 
C-G G 
C-G A 
U «U 
G *A 
30 a *G 
G U 



5'pRNA(U)* 



10G*A 
A*G 
U'U 
G-C 
A G 
G-U 
C-G 50 
-.->U-A 



C -G G 
C - G A 40 
U • U 



[1-59] 
[22-64] 



sR47 
0 30 90 



mm 



sR47Mut21U 

0 30 90 time 
(min) 



[1-21] 



UMP* 
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Here, based on biochemical studies, we report that 
Pab-aCPSFl has both an endonucleolytic and 5-3' 
exonucleolytic activity. In addition, the endonucleolytic 
cleavage occurs in single-stranded RNA with a prono- 
unced preference for CA dinucleotides. We reveal that 
the C-terminal homodimeric interface, initially identified 
in the crystal structures of M. mazei and M. thermauto- 
trophicus members (14,26), is conserved amongst aCPSFl 
homologs. Disruption of this interface in Pab-aCPSFl 
results in monomeric enzyme that has endoribonuclease 
activity, but is deficient for exoribonuclease activity. In 
the same manner, protein interactions were shown to be 
required for full activity of eukaryotic CPSF-73, which 
forms a heterodimer with CPSF-100 (an inactive 
CPSF-73 paralog) (37). Deletion of the N-terminal KH 
domains of Pab-aCPSFl abolishes endoribonuclease 
cleavage at certain specific sites and destabilizes high 



molecular weight RNPs, without affecting exoribonu- 
cleolytic activity and the dimeric state of the enzyme. 
Given the general prevalence of KH domains in proteins 
associated with transcriptional and translational regula- 
tion (PNPase, the exosome, NusA and ribosomal 
proteins) (38), it seems likely that they will have an im- 
portant role in aCPSFl specificity. Mj-aCPSFl has been 
recently reported to have endonucleolytic but not 
exonucleolytic activity (15) similar to the activity of the 
Pab-aCPSFl ACter variant studied here. Despite this 
apparent inconsistency, we believe that most aCPSFl 
members are likely to have endo- and exoribonucleolytic 
activity because this is a property of both CPSF-73 and 
Pab-aCPSFl (10-12). 

All archaeal P-CASP proteins characterized to date, 
except for Mj-aCPSFl (15), display a 5-3' exoribonucl- 
eolytc activity that is dependent on the 5 ; phosphorylation 
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state of the substrate [(15,24,25), this work]. Previous 
biochemical work showed that the translation initiation 
factor a/eIF2 binds to and protects RNA with 
S'-triphosphorylated ends from degradation in the Cre- 
narcheaon S. solfataricus (25,39). This observation 
suggests parallels to the principal mechanisms of 5 / -3 / 
RNA decay in Bacteria and Eukarya (1), in which 
Nudix hydrolases in Bacteria (40,41) and decapping 
enzymes in Eukarya (42) trigger mRNA degradation by 



producing S'-monophosphate ends. Nevertheless, compar- 
able enzymes remain to be discovered in the Archaea. In 
conclusion, the nature of the substrate 5'-end (tri- versus 
mono-phosphorylated) emerges as a major determinant in 
the activity of (3-CASP ribonucleases. 

To highlight the prospective archaeal RNA degradation 
machinery, we have summarized the distribution of the 
archaeal [3-CASP ribonucleases, together with the 
archaeal exosome and aRNase R, which have 3-5' 
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exonucleolytic activity (1,43,44) (Figure 7). It should be 
noted that the Crenarchaeaota, Thaumarcheota and 
Korarchaeota, which were recently described to be part 
of the TACK' superphylum and speculated to be at the 
origin of Eukarya (45), all have the same combination of 
RNA degrading enzymes (aCPSFl, aCPSF2 and archaeal 
exosome). In contrast, distribution in the Euryarcheota is 
heterogeneous apart from the ubiquitous aCPSFl 
homolog. As previously reported (13), the exosome is 
missing from Methanococcales, Methanomicrobiales and 



Halobacteriales, illustrating the divergence in the 
Euryarcheaota. In the Halobacteriales, the emergence of 
an RNase R-like protein is believed to compensate this 
deficiency (46) (Figure 7). In the Methanococcales, the 
absence of the exosome correlates with the presence of 
aCPSFlb homologs, suggesting a possible functional 
link between the exosome and (3-CASP proteins. 

In conclusion, the enzymatic properties of aCPSFl 
members are comparable with eukaryal CPSF-73 
including S^end-dependent exoribonuclease activity and 



1102 Nucleic Acids Research, 2013, Vol. 41, No. 2 



an endoribonuclease activity with a preference for single- 
stranded CA dinucleo tides. The strict conservation of 
these orthologs throughout the Archaea suggests a funda- 
mental role in RNA metabolism. An analogy can be made 
with the eukaryal CPSF-73, which is a component of the 
machinery required for mRNA 3'-end maturation and ter- 
mination of RNA polymerase II transcription (9,47). Our 
results suggest that a CPSF-like (3-CASP protein was 
present in the last common ancestor of Archaea and 
Eukarya. We speculate that the highly conserved 
aCPSFl might be an active component of an essential 
RNA-processing complex involved in mRNA degradation 
and/or 3'-end processing and transcription termination. 
By analogy to CPSF-73, which is part of a multi- 
component RNP, clues to the function of the archaeal 
homolog might come from future studies aimed at iden- 
tifying archaeal complexes containing aCPSFl homologs. 
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